76 research outputs found

    Applying Rule Ensembles to the Search for Super-Symmetry at the Large Hadron Collider

    Get PDF
    In this note we give an example application of a recently presented predictive learning method called Rule Ensembles. The application we present is the search for super-symmetric particles at the Large Hadron Collider. In particular, we consider the problem of separating the background coming from top quark production from the signal of super-symmetric particles. The method is based on an expansion of base learners, each learner being a rule, i.e. a combination of cuts in the variable space describing signal and background. These rules are generated from an ensemble of decision trees. One of the results of the method is a set of rules (cuts) ordered according to their importance, which gives useful tools for diagnosis of the model. We also compare the method to a number of other multivariate methods, in particular Artificial Neural Networks, the likelihood method and the recently presented boosted decision tree method. We find better performance of Rule Ensembles in all cases. For example for a given significance the amount of data needed to claim SUSY discovery could be reduced by 15 % using Rule Ensembles as compared to using a likelihood method.Comment: 24 pages, 7 figures, replaced to match version accepted for publication in JHE

    Reproducing Kernels of Generalized Sobolev Spaces via a Green Function Approach with Distributional Operators

    Full text link
    In this paper we introduce a generalized Sobolev space by defining a semi-inner product formulated in terms of a vector distributional operator P\mathbf{P} consisting of finitely or countably many distributional operators PnP_n, which are defined on the dual space of the Schwartz space. The types of operators we consider include not only differential operators, but also more general distributional operators such as pseudo-differential operators. We deduce that a certain appropriate full-space Green function GG with respect to L:=PTPL:=\mathbf{P}^{\ast T}\mathbf{P} now becomes a conditionally positive definite function. In order to support this claim we ensure that the distributional adjoint operator P\mathbf{P}^{\ast} of P\mathbf{P} is well-defined in the distributional sense. Under sufficient conditions, the native space (reproducing-kernel Hilbert space) associated with the Green function GG can be isometrically embedded into or even be isometrically equivalent to a generalized Sobolev space. As an application, we take linear combinations of translates of the Green function with possibly added polynomial terms and construct a multivariate minimum-norm interpolant sf,Xs_{f,X} to data values sampled from an unknown generalized Sobolev function ff at data sites located in some set XRdX \subset \mathbb{R}^d. We provide several examples, such as Mat\'ern kernels or Gaussian kernels, that illustrate how many reproducing-kernel Hilbert spaces of well-known reproducing kernels are isometrically equivalent to a generalized Sobolev space. These examples further illustrate how we can rescale the Sobolev spaces by the vector distributional operator P\mathbf{P}. Introducing the notion of scale as part of the definition of a generalized Sobolev space may help us to choose the "best" kernel function for kernel-based approximation methods.Comment: Update version of the publish at Num. Math. closed to Qi Ye's Ph.D. thesis (\url{http://mypages.iit.edu/~qye3/PhdThesis-2012-AMS-QiYe-IIT.pdf}

    Dimensionality reduction and prediction of the protein macromolecule dissolution profile

    Get PDF
    A suitable regression model for predicting the dissolution profile of Poly (lactic-co-glycolic acid) (PLGA) micro-and nanoparticles can play a significant role in pharmaceutical/medical applications. The rate of dissolution of proteins is influenced by several factors and taking all such influencing factors into account; we have a dataset in hand with three hundred input features. Therefore, a primary approach before identifying a regression model is to reduce the dimensionality of the dataset at hand. On the one hand, we have adopted Backward Elimination Feature selection techniques for an exhaustive analysis of the predictability of each combination of features. On the other hand, several linear and non-linear feature extraction methods are used in order to extract a new set of features out of the available dataset. A comprehensive experimental analysis for the selection or extraction of features and identification of the corresponding prediction model is offered. The designed experiment and prediction models offer substantially better performance over the earlier proposed prediction models in literature for the said problem

    Isometric Sliced Inverse Regression for Nonlinear Manifolds Learning

    Get PDF
    [[abstract]]Sliced inverse regression (SIR) was developed to find effective linear dimension-reduction directions for exploring the intrinsic structure of the high-dimensional data. In this study, we present isometric SIR for nonlinear dimension reduction, which is a hybrid of the SIR method using the geodesic distance approximation. First, the proposed method computes the isometric distance between data points; the resulting distance matrix is then sliced according to K-means clustering results, and the classical SIR algorithm is applied. We show that the isometric SIR (ISOSIR) can reveal the geometric structure of a nonlinear manifold dataset (e.g., the Swiss roll). We report and discuss this novel method in comparison to several existing dimension-reduction techniques for data visualization and classification problems. The results show that ISOSIR is a promising nonlinear feature extractor for classification applications.[[incitationindex]]SCI[[booktype]]紙本[[booktype]]電子

    Bayesian Kernel Methods

    No full text

    Learning with Kernels

    No full text

    Classification in a normalized feature space using support vector machines

    No full text
    corecore